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Sizu29oidal linkmg mechanism 



A sinusoidal linking mechanism based on similarities of 



A.C. den Brmker, A. W.J. Oomen, P.M. J. de Bout and E. Schuijers 



In smusoidal coding, linking of sinusoidal parameter^ &om subsequent frames is used in or- 
der to reduce redundancy. Typically this means diSerential coding of sinusoidal parameters 
and/or removal of the phase information along tracks. 

A proposal for a linking criterion is aaade which is based on an error measure of the 
instantaneous signal parameters at the seam. This proposal can be applied to currently 
investigated extensions of sinusoidal track models. 



The linking criterion has changed over time. The first proposal concerned the V^ pV based 
on a frequency distance [1, 2]. Later, relative frequency distance were proposed as the basis 
for a criterion [3] an.d combination of relative distances in frequency and amplitude as well 
[4]. We note that none of these methods use the phase information » 

A more elaborate way of establishing tracks was proposed in lo/C? ^/ 76^*3 

were partial signals are reconstructed based on all possible parameter links and 
these are compared with the original signal. The drawback of this algorithm is it com- 
putational burden but, in contrast to the previous mechanisms, it does take the phase 
information into accoimt. 

Bxtcnsions of the sinusoidal model have been proposed. Next to the estimation of 
frequency, amplitude and phase (or derivatives thereof) other free parameters are involved, 
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The problem for which this invention brings the 
solution 

JUI the above mentioned methods have problems. As already mentioned, the first set 
of methods [1, 4, 3] do not use phase information and may therefore provide incorrect 
tracks. The second method (XtA? co/93S/3 ) is computationally expensive. Lastly, linking 
mechanisms for extended sinusoidal models have not been proposed yet. 

A linking mechanism is proposed which uses phase information, is computationaDy 
not as expensive as h/0 C^o/99^i and which can also be appHed for extensions of the 
sinusoidal model. The basic idea is to consider the constituent complex signals of two 
subsequent frames and base the similarity measure on the instantaneous signal v^Jues and 
instantaneous stride of these complex signals at the seam. 



Embodiment 

In sinusoidal modelling, the models are typically of the form (or can be rewritten as such) 

s(t) = f: (1) 

where are the underlying sinusoidal or sinusoidal-like signals. 

If we consider two subsequent signal segments, si and js, then there is typically overlap 
in their support. In order that profitable (in a coding sense) links are estabHshed, it seems 
reasonable to speak of a link between a component m from si and n from S2 only if Um(0 
and Unit) are similar within the overlap area. - 

Method 1 ' 

First, consider the complete overlap. The aim is to identify signals in this overlap which 
are similar. This can be done by a correlation method. We define the correlation coefficient 
An^ by 

where (m = (1, M]) and y„ (n = [1, N]) aje the sets of fimction we waat to consider, w 
is a window function, E„ is the energy in signal v 

/> is a complex value which, for a link, should be close to 1. Therefore, we can build a 
(partial) similarit>' measure as before. Thus, 



5i(m. n) - ( ^ - l^"-" - M ^ lAVn -l\<D^. 
( 0. elsewhere. 



(4) 
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with Q<Di<l. 

Additionally, the equivalence in amplitude (or, more particular, in energy) can be taicen 
into account by considering 

Again, for a link, R should be a value close to 1 (but now reai-valued) and we propose as 
similarity measure 

With 0 < Z?2 < 1. 

As an overall ^milarlty measure this leads to 

5(m,n) =5i(m,n)52(m,n), (7) 

where S{mj n) =s 0 means that there is no link and the larger 5(m, n) is, the more liJcely It 
is that this can be exploited profitably as a link in a sinusoidal coding scheme. 

If signal si is approximated by M components and $2 by iV, we can construct oxiMxN 
matrix and from the entries establish if there exist links and the most profitable ones. 

Method 2 

To slmplifyr tlie procedure outlined above, we look within the overlap region at the middle 
specifically. At this point (let's say to) we should have 

i*m(«o) « «^(to). * (8)' 

We would like that in the the neighbourhood of to the signals match as well. This is realised 
if the progression (the stride) in the signals is (nearly) the same e.g., evaluated by 

In order to select links we now propose the (partial) similarity measure 

53(m.n).(i-M-^l/^' ^|^-1|<^3. (10) 
^ ' 10, elsewhere, 

with 0 c Da < 1. We note that the amplitude similarity is involved in a relative way. This 
agrees with psycho-acoustic relevance and distance criteria. 
The second partial similarity measure is defined as 

10. elsewhere, 
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with 0 < A < 1. 

The total similarity measure is defijaed as 



5(771, n) = 53(m, n)S^(m, n). 



(12). 



It is obvious that iiistead of looking at (relative) differences between complex values u, 
we can also look at the real and imaginairy part or amplitude and phase of « and construct 
the similarity criterion. This has the advantage that instead of the two parameters that 
control the above given similarity measure, we get one or more parameter per considered 
variable. Therefore, expressed in real parameters instead of complex ones, we typically end 
up with twice as many parameters. 



Application, areas 

Audio and speech coding. 
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In the foflowfng. an embodiment of the invention is descn^bed. 

Sinusoidal Coding (SSC) is a parametric audio coding technique, developed at Philips 
Research, aimed at a brt-rate of approximately 40 kbit/s for high quality stereo audio. The SSC 
coder divides audio into three objects: transients, sinusoids and noise. For each object relevant 
parameters are extracted and efficieritty encoded into a bit-stream. 

One of the most Important steps for tfie compression is the use of the tracking 
algorithm. The tracks formed by this algorithm can be encoded very efficiently using differential 
encoding. Furthermore, phaseiess recor)struction can be qpplled to further improve coding 
gain. As this is not a lossless process, the course of the tracks actually detemiines the quality. 
It is shONAm how. using the infomriatlon of 2^ order polynomials, the linking of individual 
sinusoids is improved. 

Furthermore, in the algorithm that removes the non-tracks a psycho^coustio model is 
applied to further increase the audk> quality. 



AUDIO CODING 

Audio coding Is the process of encoding/decoding digitised audio signals. This is done 
in such a manner that after encoding the data rate is kept as low as possit)le while maintaining 
as much as possible of the original quality after decoding. Mainly thanks to the Internet, audio 
coding is also publicly known, in particular the Layer III standard, better known as 

MP3. This compression scheme delivers high quality audio at compression gains of over a 
^ctor of 10. Since the standardisation of MPEG-1 many new and innovative ideas have been 
brought fonn^rd. In practice, this led to state-of-the-art MPEG4-AAC coders that provide 
compresslort-factors of around 15 while still maintaining the same high quality level as with 
MPEG-1 Layer III. The cunnent opinion in the audio coding community is that no further 
improvement in compression gains is expected for waveform type of coders. There is a general 
belief that in order to achieve even higher compression gains, audio should be coded 
parametrically. 



PARAMETRIC CODING 

\A/hen the input signals are restncted (e.g. to speech only) specific features of the input 
signal can be exploited to further improve the coding gain. One way of implementing these 
features in an audio coder is to use a parametric coding scheme. The main difference with a 
waveform coder is the explicit usage of a source model. For e.g. a speech model this is based 
on the human vocal tract For a parametric model of a piano the hammers, the snares and the 
cabinet ooukf t>e considersd. Rgure 1 depicts a schematic description of a parametric audio 
coder. 
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figure 1: Parametric audto encoder 
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f ^® advantage of parametric audio coding is the expfoltation of both the sender- 
^u^Z'^rit^T'lZ^ ""^".^^ receh,er-end (human auditory system), where the 
waveform coder exp orts the receiver^nd onjy. If the mput signal however doesn't appropriately 
ft the source model this might lead to unpredictable results. This lack of robustness then again 
IS the matn disadvantage of parametric audio coding. 

, , f^^^^Z advantage of parametric audio coding is the perceptual model. This model can 

be adapted fully to the source model. If e.g. one oiDject of the source model is defined as a set 
Of harrnonics» perceptual experiments using hamionrcs could be performed in order to come to 
a psycho-acoustic model for harmonics only. 

Parametric models are also often called object^riented models. Objects can be a bit 
abstrad like for speech: tonal elements' and 'non-tonal elements' but also concrete like 'snare 
generated harmonics' of a guitar model. A description in terms of drfTerent objects automatically 
implies that wfthin the encoder, decisions will have to be made about what part of the signal 
must be assigned to what object The more such decisions have to be made the worse the 
robustness will probably be. 

All in all parametric coding has more potential than wavefomn coding as far as bit-rate 
versus perceived audio quality is concerned. But especially when designing a parametric 
HTOdel for a broader type of input signals, one has to consider robustness versus coding 
efRciency. ^ 



SINUSOIDAL CODING 



The SSC coder is a parametric audio coder that aims at coding of audio in a broad 
sense, I.e. unrestricted to a classified source like e.g. speech. Therefore a source model must 
be developed that is both simple and effective. We have now ciassified audio into three objects, 
namely sinusokils. transients and noise. Such a description seems to be both simple as well as 
complete. ' 

Another reason for choosing these objects as the base for a parametric audio model is 
the apparent relation to psycho-acoustics. When e.g. calculating the masking curve in most 
perceptual models of waveform coders a distinction is made between tonal elements 
(sinusoids) and non-tonal elements (noise). The masking effect of sinusoidal tones is different 
from the masking effect of noise. 

The SSC codec is based on the three objects described above, namely; transients, 
sinusoids and noise. A block-diagrem of the SSC-enooder is shown in Figure 2. 
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Hgure 2: Block diagram of the SSC encoder 

One of the most cmcial steps in parametric audio coding is the subdivision of different 
parts of the signal to different objects. For both the analysis of sinusoidal components as well 
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as for anajysing noise, quasn^tationarity Is a preneqursite. So if transient phenomena could be 
removed firet the residua? signal will be more stationary and thus easier fo analyse. This is the 
main reason why the transients module CO has been placed in front of the other two. The 
sinusoidal module (5) Is placed before the noise module (N) because it is much harder to 
analyse and remove the noise from the residual s^nal of the transient module than it is the 
other way arourxl. 



SINUSOIDS 

The SSC project rs based on the assumption that any digital audio signal can be 
described adequately by (Equation 13): 

It Is however not dearfy defined what a sinusoid is. Wrthin the SSC project elements 
that are classified as being a sinusoid can be described by (Equation 14): 



where Aj,(n) is the slowly varying amplitude, a9,(/s) Is the slowly varying frequency and 

the phase of the sinusoid. This representation was first used by McAulay and Quatieri for 
describing speech srgnals [McAulay and Quatieri 1986]. 

It is neither efficient nor feasible because of complexity to extract A^(n) , o}^(n) and 

on a sampie-by-sample basis. A more feasible method would be to extract these 
parameters on a frame-to-frame ba^s. So for a single frame the sinusoide could be described 
by: 

p 

where k Indexes the finame and p the p"* sinusoid. 

in order to come to such a description another segmentation takes place on the PCM 
input signal* At this segmentation stage the space between two transient-positions is divided 
into overiapping frames of 720 samples. This number has been determined experimentally as a 
balance between statlonarity and effident coding of parameters. The segmentation is illustrated 
in Rgure 3. The upper line descn'bes the transient positions extracted by the transients module. 
The smaU blocks at the tower end show how the sinusoid^ are segmented between these 
transient positions. It is noted that the last frame of a segment Is always placed in such a way 
that it ends just t>efore another segment starts. This is also because of statlonarity as descrit>ed 
before by the first transient property. 
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Figure 3: Segnrtentation of transients and sinusoidal module 
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Even more coding gain can be achieved by applying phaseJess reconstruction. This 
means that instead of updating the phase of a track at each frame only the phase of the birth of 
a track is encoded. For all following frames of the track the phase is calojlated based on the 
presumption that the instantaneous frequency is a smooth and slowiy'-varying function in time. 

Finally the sinusoidal quantisation (SQ) block codes the processed parameters to 
further increase coding gain. It does so by quantisation and coding of the parameters extracted 
for frequent, amplitude and phase. 

Sinusoidal Analysts 

The sinusoidal analysis block consists of an iterative algorithm for extracting the 
sinusoidal parameters . The block diagram is depicted in Figure 5. 
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Rgune 5: Extraction of sinusoidal parameters 



Dunng the first iteration a segment of data (frame) is presented to the sinusoidal 
extraction bfocK. The FFT is detemiined after which the maximum amplitude of the FFT Is 
sought A rsctangular window is used t>ecause such a window has the smallest mainlobe width. 
However the main disadvantage of a rectangular window is the sfdelobe attenuation which 
causes spectral smearing. It is because of this smearing that the extraction process has to be 
done in an rteratlva way, 

For the maximum found in the FFT a fine search by means of interpolation is made in 
order to precisely extract the frequency of the underlying sinusoid. When the frequency has 
been determined the optimal amplitude and phase can be determined by use of Dnear 
r^ression. Finally the sinusoid is generated using the extracted parameters and subtracted 
from the original segment. This process is repeated so that finally fifty frequencies with 
accompanying amplitude arKi phase are extracted per frame. It is noted tfiat this method of 
extraction doesn't consider whether a spectral peak Is actually the result of a sinusoid. 

The use of only a short segment length of 720 samples implies that the lower 
frequencies cant be estimated with high precision. Therefore a multi-scale sinusoidal extraction 
mechanism has been developed (see Fig. 7). The basic prindple of thia mechanism is as 
follows. 

Rrst the PCM input signal is fed to an anti-aliasing filter (AAF) after which the signal is 
downsampled by a factor three (DS3). Now the structure of Figure 5 Is applied to 720 samples 
in the downsampled domain (SE). These samples correspond to three times 720 samples in 
^e original domain. An appropriate segment of downsampled FCM samples that correspx>nc|s 
to the samples in the original domain will therefore have to be selected. This Is done in the 
sinusoidal segmentation unit (SU). This segmentation is shown graphically for all three scales 
in Figgre 6. Note that every segment on the third scale corresponds to multiple segments on 
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F^ure 6: Multi-scale s^mentati'on 
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Figure 7; Multhscale sinusoidal analysis 
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Because of complaxit/ reasons in both encoder and decoder a description on a single 
scale is preferable. To come to a single-scate description the estimated parameters must be 
converted back to the or^inaf non-downsampled domain. This rs done fn a few steps. Rrst of 
all every scale is band-limrted and thus only a flmited amount of ttequenaes may be included. 
This selection is done in the frequent selection module (FS), For sinusoids that are kept, 
compensation fn amplitude and phase is made for the gain and delay <^used by the antK 
aliastng filter (FC). Finally the sinusoidal components are converted to the norv<downsampted 
domain by mapping the parameters to the segments (frames) of the previous scale (ST). 



Psycho-acoustic model 

The key element in removing irrelevarrcy In audio coding is the psycho-acoustic model. 
This model tries to describe, given a certain input-signal what parts of that signal can and 
cannot be perceived by the human auditory system. Psycho-acoustic models can be very 
comprehensive. They can describe time masking, the masking of parts of the signal over time, 
frequency masking, the masking of frequency components over each other, stereo 
masking/unmasking, the masking or unmasking caused by the use of stereo, etc. 

The psycho-acoustic model currently used in the SSC encoder only includes a 
frequency masking model. Every sinusoid can be seen as a frequency component with its own 
masking ability determined by its power and frequency. All these components together form a 
so-called masldng curve. This is a curve In the frequency domain that describes the total 
masking ability of a|J components present As an example the masking c^rve of three sinusoids 
with frequendes of 1000. 2000 and 4000Hz has been calculated (see Fig. 9). Note that the 
individual masWng curves get broader as the frequency gets higher. This effect as many other 
laws In psycho^coustics. are related to the critical t>and concept. This concept basically states 
that the human auditory system can be seen as a bank of t^andpass filters with different 
t>dndwidth$. This concept is described by the Equivalent Rectangular Bandwidth scale (ERB) 
defined as Equation 16: 

where the frequency / is in Hertz and the frequency in erb. The ERB scale is depicted in 
Figure 8. 
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F^re 8: Example of frequency masking of sinusoids 

^-4- •'^''5'! *® individual masking curves the total masking cuive is also oart/allv 
detem„ned by tlie hearing threshold in quiet. This is a threshoW describvj how mS S«r a 
'V.^'^'l""* to perceived in an absolute serKeT^eKng 

iSe in Fteure's ~ interrupted line, the totat masking cun« is shovm as a sSSd 



#6545-018- 



1 g; TON P^C\\ 



±aL55 PHILIPS CIP NL -1-31 40 2743489 



r. x^oj 



13 



16.01^001 



Tracking 

The main function of the tracking algorittim Is to improve codirig efficiency. The tracking 
algorithm consists of three steps: 

1 . Apply tracking: linking of sinusoidal components in time. 

2. Phaseless reconstruction: for tracks that have been found only the initial phase has to be 
transmitted, 

3. Removal of non-tracks: tracks that are very short cant be perceived as being a sinusoidal 
component they are therefore removed. 

The first step of the atgorrthm tifes to link the sinusoidal components that have been 
found on a frame-to-frame basis. This process is shown graphically in Rgure 10. A sinusoidal 
component will become one of the next four possibilities: 

1 . Birth: part of a track with only a successor. 

2. Continuation: part of a treck with predecessor and successor. 

3. Death: part of a track with only a predecessor. 

4. Non^ck: a track" consisting of a single ftame. 

Now the problem arises how to link the sinusoidal components that have been 
extracted. It is assumed that a track is a slowly varying function of both ampfi^de and 
frequency (see Equation 14). Therefore separate cost-functions for both amplitude and 
frequency have been developed. For the frequency the cost-function is based on the ERB- 
scale. The reason to do so is that for complex stimuli the relevant events are more or less 
separated according to the ERB-scale. The cost-function for frequency then becomes 
(Equation 17): 



0 fir 



17 



18 



Where e(fjjt) denotes the frequency in erb of the component in the /c*^ fifame and the 

maximally allowed deviation expressed in erb. 

For the amplitudes a similar cost-function is used (Equation 18): 

0 fir \Aj,j,-A^j,.,\^A^ 

Q-^-Hx, K-^^^-| for K^^^,,„|<4«, 

where denotes the amplitude expressed in declt)efs of the sinusoidal component in the 
A*** frame and the maximally allowed deviation. The total cost-function now becomes 
(Equation 19): 



19 



When for a certain sinusoid p there exists no Qj,^^ greater than zero it is marked as 
being the end of a track. If for a certain sinusoid p there exist mora than one Q^^ greater than 
zero sinusoid q Is chosen with the largest value of Q^^g . 
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^ Time 
(segments} 

Rgure 10: Tracking of sinusoidal components 

^ ^J*]? .f®?°r*' ?L ^® algorithm consfsts of phaseless recxjnstmction. Phaselese 

reconstruction is based on the assumption that the instantaneous phase of a track is a smooth 
function of time. For two consecutive frames of a track the instantaneous phase is equated as 
^ d2l) Assume that the sinusoid in frame k-i and frame k are described as (Equation 20 
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The equationof ttie instantaneous phase can be best perfonned at the middle of the 
overlapping s^ments. This Is because the synttiesrs windows are (symmetric) Hanning 
window. The overlap IS shown graphically in Figure 1 1 for a segment length of N^, Note th^ 
n IS defined symmetncally around zero for both Equation 20 as well as 21; shifted time-axes 
are thus used. 



PHILIPS CIP NL +3 



►43EPP 




NO. 577 



P. 21/37 



15 



16.01.2001 



1 

aa 

0.6 
' 0.4f- 
02 
0, 



1 

0.9 



W4 



-10 



1 0.6 
0.2 



L IE i 



10 
-N/4 



20 



30 



40 











» 








.Z .1 










L .1 

i 

i 











P^ure 11: Equation of rnetantaneous phase at overlapping firames 

Equating the instantaneous phase of Equation 20 and 21 at the middle of the overlap of 
a segment with length A/ then gives (Equation 22); 



N 
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which results in (Equation 2.12): 
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Equation 23 already indicates tfiat the first step of the tracking algorithm, the linking 
procedure, has great influence on quality. Erroneous linking of tracks can seriously distort 
phase relations between tracks. 

The final step of the tracking algorfthm consists of the removal of short tracks. Tracks 
that are shorter than five sinusoidal periods cannot tse perceived by the human auditory system 
as being a tonal component Such tracks are therefore deleted. 

Coding of einusoidal components 

The coding of the sinusoidal components consists o^ 

1. Quaritisation of parameters. 

2. Sort data in births and continuations. 

3. Sort births in Irequency. 

4. Sort continuations in frequency (where deaths are also seen as continuations). 

5. Apply absolute and differential coding. 

The quantisation of the parameters is done according to the same rules as the sinusoids 
in the transient code. 

In order to efficientty code the parameters and the tracking information the matrices 
containing tha sinusoidal components' frequency, amplitude and phase are sorted. This is 
shown graphically in Figure 12. The left matrix 8ho\A^ how the information is stored before 
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^^^"^ t^i^ rnfomiation is stored after sorHng Mote that for h^th 

matnces ttie on v infonnation ne*irfiaH r^i^i- ♦t.^ «-«u* • T -iwi zoning* iNoie inar for Dotn 

f^rr^r^vr* L f ^"^ ® ^"^"^^ *® amplftude and frequency are coded absoriit»ii# th^ 




Frame number 



Ffgure 12: Sorting of births and continuations 



Frame numOer 



Sinusoidal Synthesis 

The synthesis of the sinusoidal components differs from the anaivsift «r,h/ m f»,^ 
«^dew.ng that is used, in the analysis section^overlapplng fraJS';U^^a^2&%r(^^^^ 
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where p denotes sinusoid index and k the li^e index 

In the synthesis section the overlapping ftames are synlhesised as (Equation 25): 



25 

where A[«J denotes the window function and • denotes convolution. The windows that are 

f^m^^^^'^ complementary. Three types of windows are used during synthesis* 
1 . Normal window, defined as a Hanning window (Equation 26)- ^ symnesis. 



Where denotes the frame length. 
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2, Start window, defined as half rectangular and half a Hanning window (Eq. 27): 
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3. Stop window, define as half a Hanning window and for the rest a rectangular window 
(Equation 28): 



1 1 flfrf A^-lM 



for 
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' where T denotes the length of the frame needed. 

Only the frames at the edges of transients use start- and stop-windows (^mes 1 and 
41 in Figure 6), all others use normal windows (frames 2 until 40 in Figure 6)* A typical 
vnndamng sequence is depicted in Rgure 13, 




time 



start window nomial window stop window 

Figure 13: Typical synthesis windowing sequence 



QUANTISATION AND CODING 

Until now the focus was mainly on the extraction and processing of the parameters 
towards an efficient representation. It was also shown how irrelevancy Is exploited within the 
SSC coder. To further improve coding efficiency not only irrelevancy shoufd be exploited but 
afso the redundancy. 

To efficiently encode all the extracted and processed parameters Into a bit-stream three 
steps are required: 

1. Pre-processing (sorting), 

2. Quantisation, 

3. Entropy coding. 

Rrst of all the parameters need to be pre-processed. This is mainly a sorting process in 
which the different parameters are grouped together to form sets that can eac^ be represented 
efficientfy. For example the sorting of births and continuatfons as described above belongs to 
the pre-processing stage, tnit also the sorting of sinusoidal components by frequency in the 
transient module. 

In the second stage, quantisation is applied to the sorted parameters. The parameters 
must be quantised in such a manner that after decoding Just no differences can be perceived 
before and after quantisation. For e.g. the amplituctes of the sinusoids logarithmic quantisation 
can be applied. 
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ni«nti,=2r^!L ^PP''^** ^® representation levels, which represent the 

^n^Sfnf J®wf*?'K?^^ IIS^ '2.'"™" ®"^PV Huffman coding Thfe is a 

constant to variable wordlength entropy coding technique. The main advantage of inSfrnan 
coding over other entropy coders is the low complexfty: both encoding and de^nTSbe 
performed by table look-up, Furthemiore these tables are easily conetfucted '"^ 
Rgure J^^®"^' diagram for coding audio parameters using Huffhian coding is depicted In 



Process 
parameters 




Bit-stream 



£ Huffifnan ^le 

Figure 14: General block-diagram for efficiently encoding of parameters 



^ ^ One of the main pnDblems with SSC proved to be the quality scalability. From a certain 
tMt-rate on the quality doesni increase anymore. This is shown graphically In Figure 15. 



Quality 



Transparency 
mp3 «3128 Kbit/'s) 




Rgure 15: Quatrty scalabnity of the SSC coder 
A goal of the invention is to address this problem. 
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TRACKING ALGORITHM 

The tracking algortthm rs both essential jn terms of bit'-rate reduction and perceived 
audio qualrty. In principle the linking of indMdual sinusoids as it is, only has influence on the bit- 
rate. However, since inextrfcabfy bound up with the phase-less reconstruction within a track, 
the quality is also affected. A pha$e-les$ reconstruction and the removal of non-tracks cause a 
significant perceptual loss of qualfty. 

In this chapter a new approach to the tracking algorithm fs presented, based on the 
information obtained from the second order polynomial description of sinusoids. This results in 
a decreased loss of qualHy. Also adjustments to the algorithm that removes non-tracks are 
described. 



LINKING OF SINUSOIDAL COMPONENTS 

A goal of the tracking algorithm is to link individual sinusoids between consecutive 
frames. This has two purposes. Rrst of all, a track Is an efficient representation of a sihusokl 
over time. The frequency and amplitude can e.g. be coded difTerenHally. Secondly, rf tracks 
represent ^e true course of the sinusoids over time, the phase Information becomes redundant 
and can therefore be discarded, rf however incorrect links have been made« this will cause 
audible discontinuities. 

When observing spectrograms, the process of linking seems pretty straightfbnArard. In 
practice however some problems occur. To illustrate what problems can occur a synthetic 
signal is generated (Equation 2g}: 

^"1 = Zt^^((^^ ^^X^^^^rf sm(n6)^ ))n\ 29 

where p Indexes the sinusoid, P describes the total number of sinusoids, A Is the 
amplitude of the fundamental harmonic, & the frequency of the fundamental harmonic, (o^ the 
modulation depth and the modulation frequency. Figure 16 shows a part of the 
spectrogram of this signal for ^^1000, /^s30, <o-\00F,f27r , <»4 ^l^F,/2pr and 

The signal as described in Equation 29 basically describes an approximation of a 
frequency modulated square-wave by thirty sinusoids. As can be seen in Figure 29, the amount 
of frequency modulation increases over time. Visually the thirty spectral lines, each 
representing a singfe sinusoid, are easily followed. The tracking of this signal thus appears to 
be straightfonA/ard. Two types of problems however occur 

1. Wrong connections: two sinusoids of consecutive frames get connected while they should 
not have been. 

2. Non^nnection: two sinusoids of consecutive frames dont get connected while tiiey should 
have been. 

Rgure 30 shows how the current tracking mechanism, using Equations 17, 18 and 19, 
connects the extracted sinusoids for a small part of the signal in Rgure 16. The solid lines 
descrit)e tracks, where the crosses denote t>lrths (beginnings) and deaths (endings) of tracks. 






Figure 16: Speclrogrann of synthetic signal 
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ttiat fall within the circle the sinusoid that is dosest to the centre of the circle wm be chosen as 
link. In Rgure 18 this is depicted using grey values, where a darker grey gives preference over 
3 iignter grey, 
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Rgure 18: 'Strateht ahead'-property of current finWng mechanism 

Besides the 'straight ahead'-property, Frgure 18 also indicates another weakness of the 
current linking mechanism: time Information isn't taken into account during the linking process. 
The analysis of sinusoidal components took place using 60% overlap. This means that on an 
average k>asl$ the matching of sinusoids is at best at half the overlap. Ideally the linking should 
thus be performed there. The only timennformatton that Is available, ^e phases of the 
sinusoids, is cunently not exploited. 

The linking mechanism can be drastically improved using the infbmriation obtained ftx)m 
the second order polynomial sinusoid description. For this purpose a sinusoid is wrftten once 
again as the real of three complex vectors (Equation 30): 

[If] = 3lf a + &7I + cn^ ^ 

This description of a sinusoid is depicted in Rgure 19. In this figure the three complex vectors 
ae^*^ , hne^"^ and cn^e^"^ of Equation 30 era shown on the complex plane for two consecutive 
time Instances. The mapping on the real axis of the total vector, given as the sum of the three 
complex vectors, describes j^M. Note that even though each complex vector rotates with 
frequency o, the total vector, given as the sum of the three complex vectors, does not 
necessarily rotate wfth this same frequency This figure thus also proves that frequency 
variation can be handled using the three complex vectors. 




Figure 19: Sinusoid as a sum of complsx vectors 



The three complex vectors of Equation 30 can also be seen as a single complex vector 
consrsti'ng of the sum of these three vectors. 

where C(n) denotes the complex vector representing the whole signal. R(n) the envelope 
function and p(n) the instantaneous phase function. As an example a ernusoid is shown 
graphically as a single complex vector in Rgure.20. 




n, Ampittutte leaJ pan 

Figure 20: Sinusoid as a single complex vector 

"me upper figure of Figure 20 shows the wavefonn as a function of time. The lower 
figure shows complex samples of this same waveform denoted using arrow Each anow 
describes a sample of the wavafomi in Its complex fomi (see Equation 31). Figure 20 tries to 
Indicate that the complex description using the parameters: instantaneous phase 
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instantaneous frBquency and instantaneous amplitude (envefope) is no different from the 
waveform description. The shape of the wavefomn of a single sinusoid rs thus fulfy descrit^ed by 
EquatJon 31. The fnstanlaneous amplitude R(n) is the time-dependenl equivalent of the 
amplrtude ^, the instantaneous phase p(p) is the time-dependent equivalent of the phase ^ 
and the instantaneous frequency ^n)/dn \s the time-dependent equivalent of ^e frequency 
0) . The second-order polynomial thus gives extra infomnatfon on how the amplitude, phase and 
firequency vary over time, wi^ln a segment. This extra information can be used to Improva the 
linking mechanism. 

In the case a sinusoid is part of a track, there should be a dose match of waveforms of 
sinusoids In consecutive frames. Rgure 21 shows two wavefbims of overlapping frames wfth 
such a dose match at the overlap region. A possible linking criterion account for the amount of 
matching found between sinusoids at the overiap region. Equation 31 is Ideal for such a 
purpose because it describes the three independent complex vectors of a single sinusoid as 
one. The question now remains how to define a matching criterion on C(n) . 




Ovarlap Region 

Figure 21: Wavelbnn matching In overiapping Aames 

Now, fully analogous to the current Hnking mechanism, the following cost-function has 
been defined (Equation 32): 



32 



wheie the subscript p denotes the sinusoid of frame k and subscript q denotes the <f^ 
sinusokf of frame k-l. When normal windows ars applied, the cost-function Is applied at the 
middle of two overiapping segments. When start- or stop-windows are used, the cost-function rs 
applied at the edges. This is shown graphically In Figure 22. So, instead of the cument cost- 
funcb'on that makes use of global values, now instantaneous values are used. 
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Rgure 22: Linking posftfons according to s^mentation 
The cost-function for the instantaneous amplitude has been defined as (Equation 33): 
0 



for iR^^^K^j^JitR^ 



R^ 



for \R^^-Ji^^^,\<R^, 



where It denotes the Instantaneous ampfftude expressed in decibels and denotes the 

maximaiiy alfowed deviation in decibels. Both are expressed in decibels to match the human 
auditory system. 

The cost-functlon for the instantaneous frequency is defined as (Equation 34): 



0 fo'- J«(n,*)-e(£J,^-.)|ae(n^) 
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where eQ denotes the frequency expressed In ERB. The instantaneous frequency is 
approximated using a first order difference (Equation 35): 

^o^M denotes the overlap position of a frame k at the left-hand side, n,^^^^ 
denotes the overlap position of a frame * -1 at the right-hand side as shown in F^ure 22, and 
Ppj in) the instantaneous phase of the p* sinusoid fn frame k at position n . 
The cost-function for the instantaneous phase is defined as (Equation 36): 



1 i for ]Ppj,-p,j,,y\<P^, 
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where p^j^ ^nd p^j^^y ans defined as (Equation 37): 
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Where /i,„„4^ and Wa^rtk^jt^, are defined as described above. 
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Expenments using the partial cost-functions as descnbed above showed that the partial 
cost-function for the Instantaneous frequency (Equation 34) sonnetimes behaved unpredictably. 
Some further research showed that this happened especially at areas where the enveTope- 
funcllon is dose to zero. Figure 23 shows an example of such behaviour. The upper figure 
shows an amplitude-modulated sinusoid as a function of time. The lower ftgure shows the 
instantaneous firequency that belortgs to the waveform of the upper figure. As can be seen 
quite cfeariy the instantaneous frequency around position zero becomes negative. In itseff this 
Isnt a problem; both sinusoids that are to be finked will show such behaviour. Some 
experiments however showed that the matching of waveforms at such critical areas in general 
isnt very good. This espedally afFeds the instantaneous frequency. Therefore the coat-function 
of the Instantaneous firequency was replaced with the cost-function for the (modufation) 
firequency m . 
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Rguna 23: Amplitude modulated sinusoids and its instantaneous frequency 

The cost-function for the frequency then becomes identical to the current cosl-function 
for the firequency (Equation 38): 



0 for |e(fi^j,^)-«(fl>^jt-,)li«(<»Bo.) 

\J^^,*1^*A for K«,^)-.(a,.^.)|<^«>^). 
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The partial cost^functfons of Equation 33, 36 and 38 oonftai'n respecftrvely tfie thres^lds 
^ . and , which are stiir to be determined. At first this has been done in such a 

"^^^ "^leSiSr^ ^^Z^^^^^^"^^^^^ ^ spectrogram, links are indeed made. 
Secondly an optimisation towards quality has been made. For this purpose the values of 
and e(G?„^) have been sjjghtfy adjusted for an optimum between quality and 
^^Tfn'?i/i2lS!: ^smaller amount of finks means that more births will occur. Every birth is 
a)ded including its ongmal phase. This in comparison to a continuation costs rrwre b^ts 
However, the quaiity in general will increase, but so will the bit-rate. A bala^?^^^^ 
between the amount of links and the perceived audio quality. This Mto^Si^^lS. 
e(^a«) -0.5 erb, =l/3;?- radians and Jt„ =12 dB. 

H^^rnhfl? fi!.!rr'"V'^""^3r ^IJ^^P^^^ once again, but now with improved linking as 

f™^. ?hf^^ ^^^^.^'^^^uH- ^^"^ ""^ ^^''^ *"ote trackTthat have beep 
fomeii, the crosses denote births and deaths of tracks. When compared to Figure 17 the 

SnSn^i made'atSr' ^™ ""'^ '"^^ ^ 




CONTINUATION OF PHASE 

Above following equation was derived forthe contrnuatton of the phase (Equation 39): 
/ \N 39 

TTifs equation was derived from the concept of constant sinusokfs. As seen in Chapter 4 

^nn^^ In 'L!!?®'l J^""!^^ ''fr^''''' '^^'^ ^® modelled, AS an approximation the 
sinusoid in frame k then Is described as (Equation 40): 
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where Ao^^ describes linear frequency variation. The rf^ sinusoid in frame ^-1 can then be 
descn*bed on the same time-axis as Equation 40 as: 



41 



Suppose ^at ix>th sinusoids of Equation 40 and Equation 41 have been linked together. 
To obtain a smooth transitjon between the tvvo sinusoids, the instantaneous phases must be 
equated (Equation 42): 



Now suppose the two segments will be linlced in the middle of the overtap as is 

the general case (Equation 43): 

Using Equation 43 the following can be derived (Equation 44): 

Forthefwotemis A^^^^ and Aci>^^ different approximations can be used: 
• Fonivard and tiadcward diffierences (Equation 45): 

■ Approximations based on second order polynomials 

In this embodiment, the approximations based on the second order polynomials ans of 
no use for the decoder because the second order polynomials will not be included in the brt- 
stFeam. 

Different combinations of ttie fon/vard and bacjcward differences however might give 
better nesufts. Therefore they have been evaluated using infomial listening tests. For Aoi,^.! 

and A^^^ nsspectrvely the following combinations have been evaluated: fonward-fonMard, 
fon^rd-bacfcward, k)adcward-fonvard and badovard^badcward. Note that the combir^on 
fonvard-badcward leads to Equation 40. Also note that the fbnArard difference of A^^^^ needs 
future rnformation of the tracic while the bacjcward difference of A^^^_| needs history of tlie 
track. Both differences can thus only be applied on a limited area of a track. 
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H««^rih!C^ ^'^"^^^ generated using the four coinbjnQtions as 

however reasons to choose another in»ialisation point within a trade 

• Most sinusoids will not exactly start at the segmentation boundaries. For such a sinusoid 

frSS'Z^SS^ ^ that a distance Is to be kept 

trom the t>egfnning and the end of a track. 

• The accumulation of errors will be smaller when the continuation (s statted (rem the middle 

• The encoder delay becomes larger when the continuation of the phase is pertbmied from a 

^ L^*- So. a point dose to the beglnni??ofa tSil 
preferred. Note that for the decoder no such problems occur. The orfafnal ohaw ran i» 
^ "^V^^ *° of a tilck. TWs is Shown graSiy m Rg^ 2? 

*^ oontfnuaHon of the phase one frame forward in WmT ttTe 
conttnuatlon of flie phase one irame backwards in time can be calculated. So when a vihole 

raSlStefilT «»'<*"atod- F«»n this frame, the continued phase at frame *.2 c^n be 




" ^ ""-^ Framenr 
Rgure 25: Calculation of phase at birth from ordinal phase at arbitrary position 

Another set of biformal listening tests has been perfonned in order to assess the ootimal 
Pte^wrthin a track of length Itbr the original phase. The fbllowing configuratloS lS« hUrl 

1. Original phase at birth of a track: at i = i . just like the current continuation. 

2. Original phase placed in the mMdle of a track: at ^1/2]. 

3. For tracks with i > 10: original phase placed atL = S. for shorter tracks at Z = 1 

4. Fortrackswith i >5: original phase placed at i = 3. fbrshorfer tracks at Z-1 

5. For all tracks with Z>1: original phase placed at X« 2, erse at i = l. 

r.,^r^^^^!!J!l^ ^^'^^^ differences where quite small, configuration 2.3.4 and 5 all 
ft^^T^i l?*"^?" 1^^* expected. For the others no preference could be given 
therefore configuration 5 IS prefbned because of itaahoit delay. -^z-i- a<»wn. 

EVALUATION OF TRACKING MECHANISM 

H^iumi^f^^^rSa^ ^1 *^ ''^"^^"y improved using the current tracking mechanism. 
However (near.)transpareney doesnt seem fusible using constraints of the same order as in 
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the improved linking mechanism described above. Stricter linkfng-oonstraints will however 
increase the bitwate which is unwanted. 



REMOVAL OF NON-TRACKS 

Non-tracks are sinusoidal components that have not been linked, i.e. their birth and 
death takes place in the same frame. Such components are expensive in terms of bit-rate, 
while pefx::eptually they are often irrelevant. In order to achieve low bit-rates, removal therefbre 
is necessary. 

From the freld of psycho^cousttcs there is not much Knowledge on perceptual 
relevance of non-tracks. It is however known that sinusoids consisting of less than five periods 
aren*t perceived as being sinusoidal. Therefore in the cunnent removal-algorithm all sinusoids 
with a track^ength of one and all tracKa consisting of less than five sinusoidal periods are 
discarded. 

Removal of non-tracks caused a perceptual loss of quality, especially arourid transient 
positions. This in essence means that too much sinusoidal elements are diacarded. IVvo main 
reasons can be given: 

• Transients t/pioally contain of a lot of short sinusoids. This can be seen in spectrograms 
where transients typically show up as vertical lin^. 

• As descn'bed above, sinusokis consisting of less than five periods aren't perceived as being 
sfnusofdal. This however doesnt mean they may be left out without consequences. 

In order to improve the quality of the removaJ-algorithm a concession towards the bit- 
rate will have to be made. This is done t)y means of the inclusion of a psycho-acoustic model. 
First of all on a frame-to^me basis the masking curve is calculated wfth a low in-band 
masking. This is shown in Figure 26 and 27. 



In Rgure 26 the solid lines denote tracks: the crosses denote births and deaths of 
frames. Crosses that arent at the beginning or end of a Wne denote the nor>-tracks. Figure 27 
shows the masking cunre of the sinusoidal components of the selected frame of Rgure 26. TWo 
sinusoidal components fall under tiie masking cun/e. However only the sinusoidal component 
that is a non-track is removed. 

Informal listening tests showed that this technique could be applied with an In-band 
masking of up to 0 dB without any loss of quality. This however means that espet^W^ around 
transients a lot more information (sinusoids) will have to be Included which will lead to a higher 
bft-^rate. This is described, among other subjects. In the next chapter. 

Furthermore, the tracks that contain less than five periods are still removed. Informal 
Pistening tests showed that this did not further degrade the quality. 



L-f, I I r f t t I j nr I I I r I f I i» 

k-5 'ik k+5 Framenr. 

Figure 26: Selection of one frame for psycho-acoustic model 
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Frequeiicy(H2;) 

Figure 27: Frequency masfdng curve of sinusoWal components 

All in all the removal of the non-tracks had to be restricted in order to preserve the audio 
qualfty» fn other words less slnusofds are removed, and thus more sinusoids have to be 
encoded. This will always lead to a higher bit-rate. 

CONCLUSIONS 



Visual observation of tracks of synthetic signals, indicates that the operation of the linking 
algortthm seems to improve drasttcally when using Information obtained from the 2^ order 
polynomial infomiation. Experiments also showed that the use of continuous phase, with the 
cunent constraints In the linking mechanism, still does not deliver near-*ansparent quality. 
Furthermore, as shown I'n the previous chapter, the Improvement of the tracking algorithm costs 
about 2 kbft/s extra bit-rate. Correct tracking thus costs extra bit-rate. 

The 2^ order polynomial descnption in the sinusoidal tracking may advantageously be 
applied in an 8SC encoder. 
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CLAIMS: 



1 . A parametrfc cocfing method, wherein sinusoidal tracks are formed by estimation of 
sinusoidal parameters per segment and a linking procedure whrdi establishes a simifar^ 
between estimated aignai components in two subsequent segments, wherein in the linking 
procedure a linking criterion is used which is based on an error measure of instantaneous 
signal parameters at a seam between the two sut)sequent frames. 

2. A parametnc encoder comprising means for forming sinusoidal tracks by estimation of 
sinusoidal parameters per segment and a Irnklr^ procedure which establishes a similarity 
between estimated signal components in two subsequent segments, wherein in the linking 
procedure a linking cn'terion is used which is based on an enor measure of instantaneous 
signal parameters at a seam between the two subsequent frames. 
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